Introduction

In the past few years 5% of Cuba’s 11 million person population was encountered at the U.S. border. Once processed, they begin a process towards becoming permanent residents within a year settle in different parts of the United States. Many of these migrants maintain connections to family and friends in Cuba and the remittances that they eventually send back are a significant aspect of the Cuban economy. The purpose of this report is to explore the distribution of the foreign-born Cuban population in the United States. The intention here is to provide context and support for the topic of Cuban migration and remittance landscapes.

The literature review that I performed ahead of this exploratory analysis highlights the importance of remittances to Cuban families The results of investments in the built environment are as clear as the disinvestment. From personal experience, many Cubans work to send money back home in small ways like purchasing minutes and data for a phone line, and in big ways like fronting money for renovation materials, installing a new water tank systems, or purchasing a new home altogether. These processes inform an urbanism unique to Cuba, which is the subject of a larger study that could stem from this project.

I believe that taking stock of the most granular level available through Census data will give interested parties insight into the field of remittance landscapes of Cuba and migratory urbanism. From a remittance economics perspective, the foreign born Cubans moving to the United States constitute the primary agents of remittance flows, they generate the capital with their labor. Usually, when there is more money in the pocket of an immigrant there is more available to remit.

If we can accept that places can function as sorting mechanisms for the settlement of peoples, then we can assume that migrants will settle where their network leads them. For some geogrpahic areas I take a look at where Cubans settle and compare metrics such as the total count of housing units and the gross rent to income ratio. Based on my experience living at the margins in America, the cost of one’s rent relative to their income determines the standard of life that can be had. While doing this research, I put myself in my mother’s shoes and wondered: If I had come to the United States, where could I live and keep more of my income? Where could I potentially access my community and have more money to remit?

Moreover, from a business perspective, the resulting tables and plots offer first steps towards creating geomarketing campaigns intended to connect to the U.S.-based Cuban community. If someone has a business that, for instance, provides remittance services, it can get its message to the right places more efficiently.

Lastly, knowing where there have been changes in the population of foreign-born Cubans in the U.S.is useful information for organizations advocate for the Cuban-American vote and other policy initiatives.

With all these uses in mind, I sought to gain some insight into the changing Cuban-American landscape.

Data Wrangling

In order to perform the exploratory analysis I will use tidycensus to get data and its associated geometries. I will be operating on three scales: core-based areas, counties, and census tracts. It is important at this point to note that the the Census defines ‘foreign-born’ populations as ‘anyone who is not a U.S. citizen at birth, including those who become U.S. citizens through naturalization.’ Therefore, I do not study the Cuban-American population (born in the US but identify as Cuban heritage).

I compiled a list of variables for 2009 and 2022 ACS 5-Year Estimates. The codes differ between years so they were created from carefully reading load_variable() and filtering the outputs on R Studios interactive view(). All of the metropolitan and micropolitan data uses the 2022 estimates. The county and census tract data calls on both 2009 and 2022 in order to calculate the change over time in foreign-born Cubans.

Core-Based Statistical Areas: ACS 5-Year Estimates 2022

A metropolitan or micropolitan statistical area contains a core area with a substantial population nucleus, as well as adjacent communities having a high degree of economic and social integration with that core. Core Based Statistical Area (CBSA) term became effective in 2000 and refers collectively to metropolitan and micropolitan statistical areas.Each metropolitan statistical area must have one urbanized area of 50,000 or more inhabitants. Each micropolitan statistical area must have one urban cluster of 10,000 to 49,999 inhabitants

The first Census API call is to get all of the core-based statistical areas across the United States using ACS 5- Year Estimates for 2022 and filter them based the count of foreign-born Cubans. I separate the data into metropolitan and micropolitan areas. I then filter the data frame to create subsets that provide context to the existing presence of Cuban migrants.

Once I have isolated metro and micro statistical areas, as they are both returned in the call. I use filter() on metro areas to get those that have 1000+ foreign born cubans, and 100+ for micro areas. These numbers were chosen arbitrarily Lastly, because the majority of the Cuban migration is related to Florida, I further filtered the USA_metro and USA_micro objects to those that are in Florida and outside of it. I believe this will provide more context to the general story of Cuban Migration which is hyper-focused on Miami.

#get variables in core_based_statistical areas across USA

USA_allcba <- get_acs(geography = "metropolitan statistical area/micropolitan statistical area",
                             year = 2022, 
                             variables = acs22_vars, 
                             geometry = TRUE, 
                             output = "wide") 

#Rename the variables for the 2022 data
USA_allcba <- USA_allcba %>%
         rename(total_below_pov=B06012PR_002E,
         tot_vacant22 = B25002_003E,
         med_inc22 = B21004_001E,
         up65_alone22 = B09021_023E,
         tot_housingunits22 =B25136_001E,
         gross_v_inc_perc22 = B25071_001E,
         tot_fb_cuba22 = B05006_143E,
         tot_pa_inc22 = B19057_002E,
         med_yb22 = B25035_001E,
         med_values22 = B25077_001E)%>%
  mutate(vacpct22 = (tot_vacant22/tot_housingunits22)) %>%
  st_as_sf(crs = crs)

#Begin filtering
USA_micros <- USA_allcba %>%
              filter(grepl("Micro", NAME, ignore.case = TRUE)) %>%
              dplyr:: select(NAME, GEOID,tot_housingunits22,gross_v_inc_perc22,tot_fb_cuba22,med_yb22, vacpct22) %>%
              filter(tot_fb_cuba22 >= 100)%>%
              arrange(desc(tot_fb_cuba22))

USA_metros <- USA_allcba %>%
             dplyr:: filter(grepl("Metro", NAME, ignore.case=TRUE)) %>%
             dplyr:: select(NAME, GEOID,tot_housingunits22,gross_v_inc_perc22,tot_fb_cuba22,med_yb22, vacpct22) %>%
              filter(tot_fb_cuba22 >= 1000)%>%
              arrange(desc(tot_fb_cuba22))

#create final objects for kable tables
noFL_micros <- USA_micros %>%
              filter(!grepl("FL M", NAME, ignore.case = TRUE)) 
            

noFL_metros <-USA_metros %>%
              filter(!grepl("FL M" , NAME, ignore.case= TRUE))

fl_micros <- USA_micros %>%
             filter(grepl("FL M", NAME, ignore.case = TRUE)) 
            

fl_metros <-USA_metros %>%
            filter(grepl("FL M" , NAME, ignore.case= TRUE))

County Data and Calculating Change over Time

To get the original county data, an API call was made for all counties in the United States that included the count of foreign born cubans in 2009 and in 2022. The change between 2022 and 2009 was calculated for each county. I filtered the counties as those that saw increases of 500 or more, and those that had decreases 500 or less. I combined the two sets into one object that had 73 counties.

I used the 73 counties to do a iterative get_acs() API call for the tract level using the process in detail below. The end result is, for each of the counties in question, an object of their census-tract level observations of changes to the count of foreign-born Cubans. I visualize this object as one of the main outputs of this project: An interactive map that shows the changing distribution born Cubans in the United States. Again, from the lens of remittance landscapes, these places are in communication with the material world and households in Cuba which benefit from the work of these migrants.

USA_2009 <- get_acs(geography = "county",
                             year = 2009, 
                             variables = acs09_vars,
                             geometry = TRUE, 
                             output = "wide") 
USA_2009 <- USA_2009 %>%
          rename(
          fb_placebysex.2009 = B06003_013E,
          totalfb_placebycitizenshipstatus.2009 = B05002_013E,
          total_carib_1980.2009 = B05007_039E,
          total_vacancy_O_status.2009 = B25002_003E,
          total_householdtype_relationship.09 = B09016_002E,
          med_hh_inc.09 = B19013_001E,
          hh_65up.09 = B19037A_053E,
          Total_b_100pov.09 = B06012_002E,
          gross_v_income_percentage.09 = B25071_001E,
          med_built_year.09 = B25035_001E,
          tot_publicass_inc.09 = B19057_001E,
          tot_housingunits.09 = B25001_001E,
          cub_fb_total09 = B05006_127E,
          med_houseval09 = B25077_001E
          )%>%
          mutate(vacancyPct.2009 = total_vacancy_O_status.2009/tot_housingunits.09) %>% # Get Vacanct Rate
          st_as_sf(crs = crs)

USA_2022 <- get_acs(geography = "county",
                             year = 2022, 
                             variables = acs22_vars, 
                             geometry = TRUE, 
                             output = "wide") 
USA_2022 <- USA_2022 %>%
         rename(
         tot_below100pov22 = B06012PR_002E,
         tot_vacant22 = B25002_003E,
         med_inc22 = B21004_001E,
         up65_alone22 = B09021_023E,
         tot_housingunits22 =B25136_001E,
         gross_v_inc_perc22 = B25071_001E,
         tot_fb_cuba22 = B05006_143E,
         tot_pa_inc22 = B19057_002E,
         med_yb22 = B25035_001E,
         med_values22 = B25077_001E)%>%
  mutate(vacpct22 = (tot_vacant22/tot_housingunits22)) %>%
  st_as_sf(crs = crs)

#Merge the dataframes

USA0922_df <- st_drop_geometry(USA_2022,USA_2009)%>%
         left_join(USA_2009 , USA_2022, by= c("GEOID"))%>%
         mutate(change_vac_pct = vacpct22 - vacancyPct.2009,
           change_med_inc = med_inc22 - med_hh_inc.09,
           change_med_values = med_values22 - med_houseval09,
           change_count_housingunits= tot_housingunits22-tot_housingunits.09,
           change_cuba_fb = tot_fb_cuba22 - cub_fb_total09,
           change_pct_below100pov = tot_below100pov22 - Total_b_100pov.09)

#Begin Filtering Process of COUNTIES THAT MEET DOUBLE CRITERIA

##decrease
USA0922_df_filtered_decrease <- USA0922_df %>%
              dplyr:: filter(change_cuba_fb < -500)%>%
              dplyr:: select(NAME.x, GEOID, geometry,cub_fb_total09, med_inc22,up65_alone22, tot_housingunits22,   gross_v_inc_perc22, change_count_housingunits, tot_pa_inc22,med_values22, vacpct22, change_med_values,change_cuba_fb, change_med_inc,                         change_vac_pct)

##increase
USA0922_df_filtered_increase <- USA0922_df %>% 
     dplyr::  filter(change_cuba_fb > 500) %>%
     dplyr::  select(NAME.x, GEOID, geometry, med_inc22,up65_alone22,cub_fb_total09, tot_housingunits22, gross_v_inc_perc22,   change_count_housingunits,                              tot_pa_inc22,med_values22, vacpct22, change_med_values, change_med_inc,                         change_vac_pct,change_cuba_fb)
  
#Combine the two dfs to create the filtered all counties object that will be used for the census tract level selected counties pull

change_fbcuban0922<- rbind(USA0922_df_filtered_decrease, USA0922_df_filtered_increase)%>%
  st_as_sf(crs = crs)

Functions to Process the Selected Counties as inputs for their Census Tract Level Data

Upon analyzing the counties I chose using plus/minus 500 foreign born cubans as the cutoff for both decreasing and increasing changes in foreign born Cuban counts. I created an sf object of counties that between 2009-2022 had either lost or absorbed 500 foreign-born Cubans. I then used this as the input for an iterative call to get their census tract level data.

It is at this level that I desired to create the interactive map below.

It shows the census tract level data for the filtered counties. For additional context, I wanted to add another layer and see the presence of Cubans relative to the boundaries and centers of metropolitan/micropolitan areas.

Processing Name Column to create columns with the County and State Arguments

The chunk below shows the two main things neccesary to transform the ‘Name’ column in county-level get_acs() output to iterate those counties back into a new call for their respective census-tract data.

split_county_state <- function(name) {
  # Split the string on the comma to separate county and state
  parts <- str_split(name, pattern = ",", n = 2, simplify = TRUE)
  county <- trimws(parts[1])
  state <- trimws(parts[2])

  return(list(county = county, state = state))
}

clean_county_name <- function(county_name) {
  # Remove 'County' from the end of the string and any leading/trailing spaces
  cleaned_name <- sub("County$", "", county_name)
  cleaned_name <- trimws(cleaned_name)
  return(cleaned_name)
}



#apply the function to get the split vectors of 'county' and 'state' arguments for the 

change_fbcuban0922_split <- change_fbcuban0922 %>%
  mutate(
    split_data = map(NAME.x, split_county_state),
    County = map_chr(split_data, "county"),
    State = map_chr(split_data, "state")
  ) %>%
 dplyr::  select(-split_data)%>%
  mutate(cleaned_County = sapply(County, clean_county_name))# Remove the list column after extracting components

Function Takes New Columns, Performs API Call, Combines Each output Into One Object

The following chunk of code creates the iterative functions that will take a ‘county’ and ‘state’ column and run a get_acs() tidy census call for both 2009 and 2022. The variables are different each year so they have to be done separately.

iterate_county22 <- function(df) {
  # List to store the results
  results_list <- list()

  # Loop over each row of the dataframe
  for (i in 1:nrow(df)) {
    # Extract county and state from the current row
    County <- df$cleaned_County[i]
    State <- df$State[i]
    
    # Try to fetch ACS data, handling errors
    acs_data <- tryCatch({
      get_acs(
        geography = "tract",  # Make sure geography is correctly specified
        variables = "B05006_143E",  # Example variable: Total Population
        state = State,
        county = County,
        year = 2022,
        survey = "acs5",
        geometry = TRUE
      )
    }, error = function(e) {
      message("Failed for ", County, ", ", State, ": ", e$message)
      NULL  # Return NULL on failure
    })
    
    # Append the fetched data to the list, if not NULL
    if (!is.null(acs_data)) {
      results_list[[length(results_list) + 1]] <- acs_data
    }
  }

  # Combine all the results into one data frame
  final_data <- bind_rows(results_list)
  return(final_data)
}

iterate_county09 <- function(df) {
  # List to store the results
  results_list <- list()

  # Loop over each row of the dataframe
  for (i in 1:nrow(df)) {
    # Extract county and state from the current row
    County <- df$cleaned_County[i]
    State <- df$State[i]
    
    # Try to fetch ACS data, handling errors
    acs_data <- tryCatch({
      get_acs(
        geography = "tract",  # Make sure geography is correctly specified
        variables = "B05006_127E",  # Example variable: Total Population
        state = State,
        county = County,
        year = 2009,
        survey = "acs5",
        geometry = TRUE
      )
    }, error = function(e) {
      message("Failed for ", County, ", ", State, ": ", e$message)
      NULL  # Return NULL on failure
    })
    
    # Append the fetched data to the list, if not NULL
    if (!is.null(acs_data)) {
      results_list[[length(results_list) + 1]] <- acs_data
    }
  }

  # Combine all the results into one data frame
  final_data <- bind_rows(results_list)
  return(final_data)
}

Applying the Function Create 2009 and 2022 Objects

# FEEL FREE TO RUN 'read_objects_from_file' CHUNK TO SAVE TIME
final_acs09_data <- iterate_county09(change_fbcuban0922_split)
final_acs22_data <- iterate_county22(change_fbcuban0922_split)

Transforming Tract Data and Calculating The Change In Foreign Born Cubans

final_acs09_data<-final_acs09_data %>%
                  rename(cub_fb_total09 = estimate)

final_acs22_data<- final_acs22_data %>%
                  rename(tot_fb_cuba22 = estimate)

final_viz_change <- st_drop_geometry(final_acs09_data) %>%
 dplyr::  select(GEOID, cub_fb_total09) %>%  # Select only the columns needed for computing change
  left_join(final_acs22_data, by = "GEOID") %>%
  mutate(change_cuba_fb = tot_fb_cuba22 - cub_fb_total09)

#dropNA values in change, won't visualize.
final_viz_change_droppedNA <- final_viz_change[!is.na(final_viz_change$change_cuba_fb), ]

final_viz_change_droppedZ_and_NA <- final_viz_change_droppedNA[(final_viz_change_droppedNA$change_cuba_fb) != 0,]

#Add Geometry back
final_viz_change_droppedNA<- final_viz_change_droppedNA%>%
                             st_as_sf(sf_column_name='geometry')

final_viz_change_droppedZ_and_NA<- final_viz_change_droppedZ_and_NA%>%
                             st_as_sf(sf_column_name='geometry')

Data Analysis and Results

Metropolitan / Micropolitan Tables and Summaries

The following tables show the markets that remained when I applied previously described filters to the metro/micropolitan areas of the United States. I have sorted them firstly using the the highest counts of Foreign-Born Cubans. The Green column highlights the counts, the yellow column compares the gross rent to income ratios. It is also worth noting the counts of total housing units as they speak to the scale of the markets.

Disclaimer: because there are outlier markets, for the count of housing units I chose to go with the median rather than the average.

Metropolitan Areas Outside of Florida

Table

# Sorting the data
noFL_metros_sorted <- noFL_metros %>%
  dplyr::arrange(desc(tot_fb_cuba22))

# Creating the table with kable and kableExtra

noFL_metros_kable <- noFL_metros_sorted %>%
  st_drop_geometry()%>%
  dplyr:: mutate(Total_Housing_Units=tot_housingunits22, Rent_Income_Ratio22=gross_v_inc_perc22, Total_FB_Cubans=tot_fb_cuba22, Med_Year_Built= med_yb22)%>%
  dplyr:: select(NAME, Total_Housing_Units,Rent_Income_Ratio22,Total_FB_Cubans, Med_Year_Built )%>%
  kbl(caption = "Metropolitan Areas Outside Florida") %>%
  kable_classic(full_width = F, html_font = "Georgia") %>%
  kable_styling(
    position = "center",
    font_size = 12,
    
  ) %>%
  column_spec(1, bold = TRUE, color = "#ca5733") %>%
  column_spec(3, background = '#fbc61d')%>%
  column_spec(4, bold = TRUE, background = '#3b914e') 
 

# Print the table to view in an RMarkdown output or similar
noFL_metros_kable
Metropolitan Areas Outside Florida
NAME Total_Housing_Units Rent_Income_Ratio22 Total_FB_Cubans Med_Year_Built
New York-Newark-Jersey City, NY-NJ-PA Metro Area 7981356 31.0 61897 1959
Houston-The Woodlands-Sugar Land, TX Metro Area 2760561 30.5 30963 1991
Las Vegas-Henderson-Paradise, NV Metro Area 923275 32.5 23902 1997
Los Angeles-Long Beach-Anaheim, CA Metro Area 4730219 33.7 16932 1968
Louisville/Jefferson County, KY-IN Metro Area 561271 27.8 13508 1976
Dallas-Fort Worth-Arlington, TX Metro Area 2963281 29.7 12752 1990
Atlanta-Sandy Springs-Alpharetta, GA Metro Area 2420310 30.7 8286 1993
Phoenix-Mesa-Chandler, AZ Metro Area 1996937 29.8 7825 1993
Austin-Round Rock-Georgetown, TX Metro Area 960087 29.1 6941 1999
Chicago-Naperville-Elgin, IL-IN-WI Metro Area 3942534 29.2 6462 1970
Riverside-San Bernardino-Ontario, CA Metro Area 1584750 34.0 4889 1986
Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Area 2500311 28.9 4675 1982
New Orleans-Metairie, LA Metro Area 572691 33.3 4152 1976
Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Metro Area 2590451 30.3 3664 1965
San Antonio-New Braunfels, TX Metro Area 1015924 30.5 3526 1991
Charlotte-Concord-Gastonia, NC-SC Metro Area 1115218 28.6 3380 1994
Boston-Cambridge-Newton, MA-NH Metro Area 2033504 29.9 2939 1963
Nashville-Davidson–Murfreesboro–Franklin, TN Metro Area 836994 29.7 2617 1992
Kansas City, MO-KS Metro Area 940968 27.7 2168 1978
San Francisco-Oakland-Berkeley, CA Metro Area 1851003 28.4 2073 1967
Albuquerque, NM Metro Area 395967 30.8 2035 1985
Grand Rapids-Kentwood, MI Metro Area 432268 28.9 1966 1978
Lancaster, PA Metro Area 216592 28.1 1917 1977
San Diego-Chula Vista-Carlsbad, CA Metro Area 1230349 33.6 1844 1979
Portland-Vancouver-Hillsboro, OR-WA Metro Area 1036369 30.4 1831 1983
Rochester, NY Metro Area 489741 30.4 1771 1966
Seattle-Tacoma-Bellevue, WA Metro Area 1657075 29.2 1625 1984
Detroit-Warren-Dearborn, MI Metro Area 1905259 29.7 1620 1968
Virginia Beach-Norfolk-Newport News, VA-NC Metro Area 761331 31.1 1547 1982
Baltimore-Columbia-Towson, MD Metro Area 1190378 30.2 1438 1975
Raleigh-Cary, NC Metro Area 581802 28.4 1374 1998
Denver-Aurora-Lakewood, CO Metro Area 1245265 30.5 1348 1985
Midland, TX Metro Area 72376 28.0 1319 1989
Hartford-East Hartford-Middletown, CT Metro Area 521773 30.2 1258 1967
Minneapolis-St. Paul-Bloomington, MN-WI Metro Area 1509511 28.9 1213 1980
Grand Island, NE Metro Area 31716 27.3 1119 1973
Syracuse, NY Metro Area 296553 29.7 1086 1964
Odessa, TX Metro Area 66082 28.7 1067 1981
Lansing-East Lansing, MI Metro Area 236080 29.5 1024 1973
New Haven-Milford, CT Metro Area 371281 31.4 1002 1964

Summary statistics

summary_stats_noflmetros <- data.frame(
  Statistic = c("Median Count Total Housing Units", "Average Rent to Income Ratio(Percentage)", 
                "Median Count of Foreign Born Cubans", "Mean Year Structure Built"),
  Value = round(c(1026146, 30.0, 2054, 1980),0
))

# Generate kable table
kable(summary_stats_noflmetros, format = "html", col.names = c("Statistic", "Value"), 
      caption = "Summary Statistics") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = '#68a6d5')
Summary Statistics
Statistic Value
Median Count Total Housing Units 1026146
Average Rent to Income Ratio(Percentage) 30
Median Count of Foreign Born Cubans 2054
Mean Year Structure Built 1980

Metropolitan Areas in Florida

Table

# Sorting the data
fl_metros_sorted <- fl_metros %>%
  dplyr::arrange(desc(tot_fb_cuba22))

# Creating the table with kable and kableExtra

fl_metro_kable <- fl_metros_sorted %>%
  st_drop_geometry()%>%
  dplyr:: mutate(Total_Housing_Units=tot_housingunits22, Rent_Income_Ratio22=gross_v_inc_perc22, Total_FB_Cubans=tot_fb_cuba22, Med_Year_Built= med_yb22)%>%
  dplyr:: select(NAME, Total_Housing_Units,Rent_Income_Ratio22,Total_FB_Cubans, Med_Year_Built )%>%
 kableExtra::  kbl(caption = "Florida Metropolitan Areas") %>%
  kable_classic(full_width = F, html_font = "Georgia") %>%
  kable_styling(
    position = "center",
    font_size = 12,
    
  ) %>%
  column_spec(1, bold = TRUE, color = "#ca5733") %>%
  column_spec(3, background = '#fbc61d')%>%
  column_spec(4, bold = TRUE, background = '#3b914e') 

# Print the table to view in an RMarkdown output or similar
fl_metro_kable
Florida Metropolitan Areas
NAME Total_Housing_Units Rent_Income_Ratio22 Total_FB_Cubans Med_Year_Built
Miami-Fort Lauderdale-Pompano Beach, FL Metro Area 2643202 36.8 777702 1982
Tampa-St. Petersburg-Clearwater, FL Metro Area 1471328 32.4 81108 1985
Orlando-Kissimmee-Sanford, FL Metro Area 1094927 33.7 33255 1993
Cape Coral-Fort Myers, FL Metro Area 419916 33.5 31798 1994
Naples-Marco Island, FL Metro Area 229814 35.9 18677 1995
Jacksonville, FL Metro Area 695854 31.0 8893 1990
North Port-Sarasota-Bradenton, FL Metro Area 462959 32.9 8672 1988
Lakeland-Winter Haven, FL Metro Area 320023 30.8 7372 1990
Port St. Lucie, FL Metro Area 231647 33.7 5223 1989
Palm Bay-Melbourne-Titusville, FL Metro Area 290314 32.1 3420 1987
Deltona-Daytona Beach-Ormond Beach, FL Metro Area 330161 33.4 3219 1988
Ocala, FL Metro Area 179079 30.0 3022 1991
Sebring-Avon Park, FL Metro Area 57605 29.9 2137 1986
Gainesville, FL Metro Area 152302 35.8 2122 1988
Sebastian-Vero Beach, FL Metro Area 83801 34.4 1609 1990
Punta Gorda, FL Metro Area 111330 35.4 1500 1989
Tallahassee, FL Metro Area 173637 33.8 1165 1988
Pensacola-Ferry Pass-Brent, FL Metro Area 222708 29.9 1014 1988

Summary Statistics

summary_stats_flmetros <- data.frame(
  Statistic = c("Median Count Total Housing Units", "Average Rent to Income Ratio(Percentage)", 
                "Median Count of Foreign Born Cubans", "Mean Year Structure Built"),
  Value = round(c(260981, 33.08, 4322, 1989), 0)
)

# Generate kable table
kable(summary_stats_flmetros, format = "html", col.names = c("Statistic", "Value"), 
      caption = "Summary Statistics") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = '#68a6d5')
Summary Statistics
Statistic Value
Median Count Total Housing Units 260981
Average Rent to Income Ratio(Percentage) 33
Median Count of Foreign Born Cubans 4322
Mean Year Structure Built 1989

Micropolitan Areas in Florida

Table

# Sorting the data
fl_micros_sorted <- fl_micros %>%
  dplyr::arrange(desc(tot_fb_cuba22))

# Creating the table with kable and kableExtra

fl_micros_kable <- fl_micros_sorted %>%
  st_drop_geometry()%>%
  dplyr:: mutate(Total_Housing_Units=tot_housingunits22, Rent_Income_Ratio22=gross_v_inc_perc22, Total_FB_Cubans=tot_fb_cuba22, Med_Year_Built= med_yb22)%>%
  dplyr:: select(NAME, Total_Housing_Units,Rent_Income_Ratio22,Total_FB_Cubans, Med_Year_Built )%>%
  kbl(caption = "Florida Micropolitan Areas") %>%
  kable_classic(full_width = F, html_font = "Georgia") %>%
  kable_styling(
    position = "center",
    font_size = 12,
    
  ) %>%
  column_spec(1, bold = TRUE, color = "#ca5733") %>%
  column_spec(3, background = '#fbc61d')%>%
  column_spec(4, bold = TRUE, background = '#3b914e')

# Print the table to view in an RMarkdown output or similar
fl_micros_kable
Florida Micropolitan Areas
NAME Total_Housing_Units Rent_Income_Ratio22 Total_FB_Cubans Med_Year_Built
Key West, FL Micro Area 54034 34.3 6868 1981
Clewiston, FL Micro Area 15227 34.8 2811 1987
Okeechobee, FL Micro Area 18496 28.5 471 1987
Arcadia, FL Micro Area 15567 30.0 251 1987
Wauchula, FL Micro Area 9837 33.6 237 1984
Lake City, FL Micro Area 29835 28.5 131 1991
Palatka, FL Micro Area 36183 30.7 127 1984

Summary Statistics

summary_stats_flmicro <- data.frame(
  Statistic = c("Median Count Total Housing Units", "Average Rent to Income Ratio", 
                "Median Count of Foreign Born Cubans", "Mean Year Structure Built"),
  Value = round(c(18496, 31.49, 251, 1986),0)
)

# Generate kable table
kable(summary_stats_flmicro, format = "html", col.names = c("Statistic", "Value"), 
      caption = "Summary Statistics") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
 row_spec(0, bold = TRUE, background = '#68a6d5')
Summary Statistics
Statistic Value
Median Count Total Housing Units 18496
Average Rent to Income Ratio 31
Median Count of Foreign Born Cubans 251
Mean Year Structure Built 1986

Micropolitan Areas Outside of Florida

Table

# Sorting the data
noFL_micros_sorted <- noFL_micros %>%
  dplyr::arrange(desc(tot_fb_cuba22))

# Creating the table with kable and kableExtra

noFL_micros_kable <- noFL_micros_sorted %>%
  st_drop_geometry()%>%
  dplyr:: mutate(Total_Housing_Units=tot_housingunits22, Rent_Income_Ratio22=gross_v_inc_perc22, Total_FB_Cubans=tot_fb_cuba22, Med_Year_Built= med_yb22)%>%
  dplyr:: select(NAME, Total_Housing_Units,Rent_Income_Ratio22,Total_FB_Cubans, Med_Year_Built )%>%
  kbl(caption = "Micropolitan Areas Outside of Florida") %>%
  kable_classic(full_width = F, html_font = "Georgia") %>%
  kable_styling(
    position = "center",
    font_size = 12,
    
  ) %>%
  column_spec(1, bold = TRUE, color = "#ca5733") %>%
  column_spec(3, background = '#fbc61d')%>%
  column_spec(4, bold = TRUE, background = '#3b914e')
 

# Print the table to view in an RMarkdown output or similar
noFL_micros_kable
Micropolitan Areas Outside of Florida
NAME Total_Housing_Units Rent_Income_Ratio22 Total_FB_Cubans Med_Year_Built
Moultrie, GA Micro Area 19143 28.9 458 1985
Dumas, TX Micro Area 8184 22.3 422 1973
Columbus, NE Micro Area 14085 26.0 412 1972
Storm Lake, IA Micro Area 8197 22.2 369 1962
Hastings, NE Micro Area 13804 27.4 361 1965
Norfolk, NE Micro Area 20832 24.5 352 1971
Hobbs, NM Micro Area 27854 26.3 290 1975
Alamogordo, NM Micro Area 32244 27.9 270 1984
Hereford, TX Micro Area 6983 20.5 249 1969
Dodge City, KS Micro Area 12568 21.2 246 1971
Georgetown, SC Micro Area 36219 30.4 199 1991
Austin, MN Micro Area 16933 27.3 197 1957
Shelby, NC Micro Area 43782 30.1 168 1979
Yankton, SD Micro Area 10405 24.5 160 1976
Toccoa, GA Micro Area 12342 24.5 155 1982
Guymon, OK Micro Area 8448 23.3 151 1974
Richmond-Berea, KY Micro Area 46221 28.3 147 1990
Douglas, GA Micro Area 20864 25.7 133 1988
Jasper, IN Micro Area 24185 24.3 124 1978
Lexington, NE Micro Area 11049 21.9 123 1967
Hermiston-Pendleton, OR Micro Area 35988 24.6 119 1977
Ottumwa, IA Micro Area 15754 29.2 118 1957
Newport, TN Micro Area 17833 28.1 117 1986
Washington Court House, OH Micro Area 12685 25.0 115 1970
McMinnville, TN Micro Area 18164 26.9 108 1978
Seymour, IN Micro Area 19144 26.4 103 1978
Lumberton, NC Micro Area 48811 29.3 100 1985

Summary Statistics

summary_stats_nofl_micros <- data.frame(
  Statistic = c("Median Count Total Housing Units", "Average Rent to Income Ratio(Percentage)", 
                "Median Count of Foreign Born Cubans", "Mean Year Structure Built"),
  Value = round(c(17833, 25.81, 160, 1976),0)
)

# Generate kable table
kable(summary_stats_nofl_micros, format = "html", col.names = c("Statistic", "Value"), 
      caption = "Summary Statistics") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = '#68a6d5')
Summary Statistics
Statistic Value
Median Count Total Housing Units 17833
Average Rent to Income Ratio(Percentage) 26
Median Count of Foreign Born Cubans 160
Mean Year Structure Built 1976

Correlations/Regressions

For each of the models that I ran I looked at how the total count of foreign born cubans was impacted by the variables shown above (total number of housing units, gross rent to income ratios, or the median year of structures. The results of the regressions below lack predictive power.Highlighting a need for better feature engineering to predict what markets attract Cuban immigrants.

There were times when the total count of housing units showed up as a significant predictor (p-value < 0.001). But it did not make a noticeable difference in the count of foreign born Cubans per unit of increase. The analysis would benefit from more preparation to catch outliers, and a larger quantity of observations.

Below are some results from the exploratory analysis.

Outside of Florida: Metropolitan Areas Regression

The total Count of Housing Units was a significant predictor until I removed the New York/Newark metropolitan area from the model’s observations. Even then, it did not make significant change in the count per unit of housing increased (less than a person).

Below is a scatter plot showing that the metropolitan areas outside of Florida that Cuban Immmigrants tend to live have less than 3 million housing units. Really, there is a visible concentration below in metropolitan areas with less than 2 million housing units. This can mean that the settlement choices of Cubans are smaller markets than a world city like New York, for instance.

#remove outlier (NY NJ) for better scatterplot of housing units

noFL_metros<- noFL_metros%>%
  filter(!str_detect(NAME, 'New York'))

noFL_metro_fit <- lm(tot_fb_cuba22 ~ tot_housingunits22 + gross_v_inc_perc22+  med_yb22, data=noFL_metros)

summary(noFL_metro_fit)

plot(noFL_metros$tot_fb_cuba22, noFL_metros$tot_housingunits22)

Florida: Metropolitan Areas Regression

When it comes to Florida, Tampa and Miami are clear outliers ran the same regression as above. The adjusted R square was .74, suggesting a stronger model with the few variables that we have. Again, strong predictors were housing units (p-value <0.005) and the median year of built structures(p < 0.002).

Below is a scatterplot of the Total count of housing units and the total count of foreign born Cubans.

#filter out Miami for scatterplot
fl_metros<- fl_metros%>%
  filter(!str_detect(NAME, c('Miami|Tampa')))

fl_metros_fit <- lm(tot_fb_cuba22 ~ tot_housingunits22 + gross_v_inc_perc22 + med_yb22, data=fl_metros)
summary(fl_metros_fit)

plot(fl_metros$tot_fb_cuba22, fl_metros$tot_housingunits22)

Outside of Florida: Micropolitan Areas

The resulting regression did not return any strong predictors. This could be due to the distribution of the data, seen below. It makes sense given that we are talking about micropolitan areas. My theory is that the decision to settle in these areas is due to factors that would require deeper investigation than one using Census data. They may be more nuanced and network-related. Interviews and oral histories would best shed light on this aspect of the Cuban migration story.

noFl_micro_fit <- lm(tot_fb_cuba22 ~ tot_housingunits22 + gross_v_inc_perc22 + med_yb22, data=noFL_micros)
summary(noFl_micro_fit)
plot(noFL_micros$tot_housingunits22, noFL_micros$tot_fb_cuba22)

Florida: Micropolitan Areas Regression

There are too few observations to make any meaningful statistical inferences for this subset of data. However, like with the micropolitan areas outside of Florida, it would be good to investigate deeper the roots of their Cuban communities.

Florida_micros_fit <- lm(tot_fb_cuba22 ~ tot_housingunits22 + gross_v_inc_perc22 + med_yb22, data=fl_micros)
summary(Florida_micros_fit)
plot(fl_micros$tot_housingunits22, fl_micros$tot_fb_cuba22)

Regressions: County and Census Tracts in 2009 on the Count of Change

In the following data analysis I examine the relationship between the count of foreign-born Cubans in 2009 and the change observed in 2022. I look at the County and Census tract levels using the filtered objects from the Census API calls.

Census Tracts

In the final dataset used to generate the interactive map, there are 4549 census tracts that do not have NA or 0 values of change in the count of Cubans between 2009 and 2022.The chunk below attempts to get a spearman and pearson coefficient, -0.46 and 0.10 respectively, for the relationship between counts of cubans in 2009 and the change observed in 2022. At the census tract level, the count of Cubans in 2009 could not by itself predict the changes observed in counts in 2022.

spearman<-cor(final_viz_change_droppedZ_and_NA$cub_fb_total09, final_viz_change_droppedZ_and_NA$change_cuba_fb, method="spearman")  
pearson<- cor(final_viz_change_droppedZ_and_NA$cub_fb_total09, final_viz_change_droppedZ_and_NA$change_cuba_fb, method="pearson")       #method can be 

final_tracts_fit <- lm(change_cuba_fb ~ cub_fb_total09,
data=final_viz_change_droppedZ_and_NA)
summary(final_tracts_fit)

County Level

For this analysis I removed an outlier by making it only the counties that had less than 100000 foreign born cubans in 2009. But then i Went further and kept only those with 10000. The slight linear relationship between the presence of the community in 2009 and the changes that were observed gives some credence to the idea that people move based on their network. Perhaps scale varies the migration’s absorption. For instance, Miami has a huge presence and gets the highest rices in counts. Perhaps that is an underlying characteristic of migration processes that sounds like the first law in geography. Places that already have migrants from a particular region will receive more of their people, but places with with even more presence will absorb more.

I did the same tests as above. Pearson and Spearman correlation results were 0.64, 0.31 respectively when i kept only counties that in 2009 had at most 10,000 cubans. When I add the outliers back in, the pearson coefficient increases to 0.94, and the regression returns an adjusted R2 of 0.88 and an a p-value < 0.0001 for the the count of foreign born cubans in 2009.

A note on Miami:it is an outlier as a value, but it is a very significant piece of the Cuban migration history and its high number only speaks to the scale of its connection to the Cuban immigrants. It would naturally supports the idea that people are still following their networks as the regression shows. Without the presence of such a strong magnet Cuban still grew slightly where they already were. But some counties decreased in counts of FB Cubans, why are they leaving?

#object w the countties
change_fbcuban0922$cub_fb_total09
change_fbcuban0922$change_cuba_fb

#filter outlier
change_fbcuban0922reg<-change_fbcuban0922

cor(change_fbcuban0922reg$cub_fb_total09,
change_fbcuban0922reg$change_cuba_fb, method='spearman')
cor(change_fbcuban0922reg$cub_fb_total09,
change_fbcuban0922reg$change_cuba_fb, method='pearson')

hist(change_fbcuban0922reg$change_cuba_fb)

plot(change_fbcuban0922reg$cub_fb_total09,
change_fbcuban0922reg$change_cuba_fb)

final_counties_fit <- lm(change_cuba_fb ~ cub_fb_total09,
data=change_fbcuban0922)
summary(final_counties_fit)

Discussion and Conclusion

Census Tract Level Interactive Map

The interactive map below is my favorite part about combining curiousity about a research question and understanding how to leverage computer programming to create a cool visual tool. Below are the census tracts across the United States that show changes to their Count of Foreign Born Cubans between 2009 and 2022. When you click a tract, the value of change will pop up. Another key piece of this interactive map is that it allows us to see the relationship between Cuban migration and the structure of the metropolitan region. When the tracts are outside of urban centers, what does it mean? When they are near city centers, how long have they been there? What kind of neighborhood? Overall, I think this tool can help researchers of Cuban history consider differently the footprint of the Cuban diaspora relative to the centuries of history that defined the U.S-Havana relationship. Especially the waves of migration that have taken place the past 65 years.

Conclusion

Reestate remittances and looking into the agents. based on literature review.

set out to explore the data using summary statistics and regressions when possible. At the county level, the count of immigrants in 2009 was a significant predictor of changes in the count of foreign born Cubans. However that was when I included Miami, when I exclude it the relationship is less direct. I believe that the inclusion is acceptable given its historic relationship to Cuban Migration.

Having the list of tables. Louisville. Markets that stand out.

The question about change in the count of foreign born Cubans speaks to the distribution of the Cuban diaspora in America. It also sets a groundwork for further spatial analysis and strategic organizing efforts.

That finding (county level regression, 09 presence, people follow network) supports the idea that people are still following their networks. use tableto find the best gross rent to income ratios.